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Searching for periods in X-ray observations using Kuiper's test. 

Application to the ROSAT PSPC archive 
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Abstract. We use Kuiper's test to detect periodicities in X-ray and gamma-ray observations. Like Rayleigh's test, it uses the 
individual photon arrival times, and is therefore well suited to the analysis of faint sources. Our method makes it possible to 
take into account the discontinuities in the observation, and to completely get rid of the contamination that results from them. 
This makes it particularly adapted to the search of periods long compared to the total observation duration. We propose a semi- 
analytical approach to determine the effective number of trial frequencies when searching for unknown periods over a frequency 
range. This approach can be easily adapted to other tests. We show that, using Kuiper's test, we can recover periods in frequency 
domains where other tests are completely confused by contamination. We finally search the entire ROSAT Position-Sensitive 
Proportional Counter (PSPC) archive for long periods, and find 28 new periodic-source candidates. 
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1. Introduction 

Important efforts have been devoted to the search of peri- 
odic signals throughout the electromagnetic spectrum. Because 
of the idiosyncrasies of astrophysical observations, differ- 
ent methods must be used depending on the type of object 
and the wavelength range. Four test families seem to dom- 
inate the period-detection "market". The calculation of the 
Fourier power spectrum density (e.g., Pre ss et alJI 19931) using 
a fast Fourier transform (FFT) is adapted to evenly spaced (or 
evenly binned) observations. The Lom b-Scargle periodogram 
jLombll976HScarglell9"82llHorne & Baliunasl 19861) . a discrete 
Fourier transform method, can be used for une venly-sp aced 
flux measurements. Epoch folding (EF) (e.g., iLeahv et all 
1983a) can be used in the same conditions or for individ- 
ual photons, but req uires a binning according to the phase. 
Rayleigh's test (e.g.. iGibson et alJll982t iFisheJI 1993b is par- 
ticularly adapted for the analysis of individual photons. 

Observations in the X- and gamma-rays usually have two 
important characteristics. First, independent, time-tagged pho- 
tons are collected. A method requiring binning is therefore far 
from ideal, as it results in a loss of information. Furthermore, 
binning is prohibited for sources detected with very few pho- 
tons; for EF for instance, the required assumption of Gaussian 
distribution in each bin is not satisfied in this case. Moreover, 
the necessary assumptions on the numbe r and sizes of the 
bins l ower the performance of the test JSchwarzenberg-Czernvl 
1999). Secondly, space observations are often interrupted by 



"bad time" periods, where no data are received. Fourier-based 
methods and Rayleigh tests are seriously affected by this prob- 
lem. In practice, it means that only periods short compared to 
the durations of uninterrupted observation can be investig ated. 

In this paper we present in detail Kuiper's test (Kuiper 

119601) . This test has be en applied to the distribution of solar 
flares Jjetsu et al.ll 19971 a nd to the s earch for periodicities in 
Earth impacts Jjetsull 19971 Lletsu & Pel3l2OO0t) . but its unique 
suitability to X-ray and gamma-ray observations has been over- 
looked. Similarly to Rayleigh's test, it uses discrete events, and 
can be applied to very faint sources without any a priori as- 
sumption. Similarly to EF, it takes into account non-uniform 
coverage of the phase domain, and can therefore be used when 
searching for periods long compared to the total observation 
duration 1 . We study in detail the properties of Kuiper's test for 
period detection, and particularly its significance level. We con- 
centrate on two important issues: the treatment of discontinu- 
ous observations, and the determination of the effective number 
of trial frequencies when searching for unknown periods. We 
finally apply the algorithm to the entire archive of the ROSAT 
Position-Sensitive Proportional Counter (PSPC) archive. 



2. Kuiper's test 

Kuiper's test JKuipeJ 1 1 960}) is a variant of Kolmogorov- 
Smirn ov's (KS) test (see IPress etaD Jl993l) and LTetsu & PelJ 
i 19961) for short introductions). Given a sample {x,}, i=l, ...,N, 
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1 In this paper, "total duration" means the time interval between the 
start and the end of an observation, including possible gaps. 
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and a probability distribution (p(x), a < x < b, the Kuiper statis- 
tic is defined by: 

y°({x,}) = max (s [Xi] (x) - ®(x)) + max (<J>(x) - S M (x)) , (1) 

a<x<b ' ' a<x<b \ ' 

where O(x) = £ <p(y) dy, and S Uil (x) = < x)/N is the 
empirical cumulative distribution of the {*,}, i=l,,..,N sample 
(#(...) meaning "number of ..."). Similarly to KS, the Kuiper 
statistic does not depend on the underlying distribution. The 
null hypothesis is that the {x,}, i=l, N sample is an outcome 
of N draws from the ip(x) distribution. 

Kuiper's test can be readily transformed into a test of pe- 
riodicity in a series of photons by phase-folding their arrival 
times {f,}, i=\, ...,N for a given test period Po-^lfo'- 



<A;(/o) = Frac 



Po 



i = l,...,N 



(2) 



where Frac(y) is the fractional part of y, and to an arbitrary time. 
In the absence of periodicity at frequency fo, the i^, (/o) phases 
are expected to be distributed uniformly. This can be tested us- 
ing the Kuiper statistic V t/ ({iA,(/o)}), where U(x) = x, < x < 
1 is the cumulative of a uniform distribution between and 
1. A very low probability is evidence that the phases are not 
uniformly distributed for this frequency, and indicates a peri- 
odicity (but see Sect. 15. 2\ . 

Contrarily to KS or EF, the Kuiper statistic is invariant 
under a shift of the origin for periodic distributions. As a re- 
sult, V®({iffi(Jo)Y) is invariant under a shift in phase {ifri(Jo)} — * 
{fti(fo) + iffo m °d 1) that would result from a different choice of 
k- 

2.1. Significance of the Kuiper statistic 

iKuinerl £l960) gave the following asymptotic expression for 
large N to calculate the probability of the Kuiper statistic V 
to be larger than a given value z under the null hypothesis: 



Prob(V > zl VAT) 



2 2(4m¥ 



l)e 



-2m 1 z 



8- 



3VA^ 



z 



m 2 (4m 2 z 2 



m=l 



3)e- 2mV + 01- 
1 N 



(3) 



This is the false positive probability (FPP) of falsely rejecting 
the null hypothesis. This formula is systematically used, even 
thou gh its validity for small N has not been tested (see, e.g., 
Ijetsu & Peh1[l9 96i). In Appendix |X| we show that the FPP is 
overestimated by a factor 3 for N=20 at the 1CT 7 level. For Af < 
15, the probability is underestimated, which wrongly increases 
the rate of false positives by a factor 30 at the 1CT 7 level. Eq. Q 
is t herefore seriousl y wrong for small N. 

IStephensl lll965h gives two analytical formulae valid for the 
lower tail (= 1 — FPP) of the Kuiper statistic distribution: 



Prob(V < z) = N\ \z 



A<-1 



1 2 
, if — < z < — , 
N N 



(4) 



(jj is the minimum of the Kuiper statistic), and, if -| < z < j; 4 . 

(N— 1)! {b n ~ 1 (\ - or) - or iv_1 (l -/?)) 

Prob(V < z) = (5) 

N N - 2 (J3-a) 
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Fig. 1. Domains in FPP vs Sample Size covered by the four 
formulae. The labels indicate the number of the equation used 
to calculate the probability. The structures in the dark grey area 
are due to the different validity criteria for even and odd num- 
bers in Eq.(|6} 



with a and [5 being the two solutions of the quadratic equation: 
t 2 -(Nz- Dt + k(N z - 2) 2 = 0. 

ISteph ens ( 1965) also gives an analytic formula for the FPP: 



M i \ 

Prob(V>z) = £ P \(l- z -Lf-'-% 
1=0 \ I 



with: 



T, =y'-\y i N-y't 



2 3-2/N t(t - \)(t - 2) 



N 



N 2 



), 



(6) 



(7) 



where y = z + -h, which is valid if z > 1 /2, if A^ is even, and if 
z>(N- l)/(2N), if Af is odd. 

The domains of validity of the three exact equations are 
shown in Fig. ^ the asymptotic formula being used outside 
them. The validity condition of Eq. is difficult to satisfy 
for large For N = 100, the probability that z > 1/2 is of 
the order of 10~ 21 . For N=50, this probability is of the order 
of 10~ 10 , making Eq.(|6j useful even for intermediate-size sam- 
ples. Eqs (0} and Q represent 40% of the cases for Af=10, and 
only 1 % for N-2Q. Using the four equations, the FPP is never 
underestimated. The only remaining discrepancy with the true 
distribution is in the region Af ~ 40 - 50, where the probability 
is overestimated by a factor 1.5 at the 10 -7 level. 

2.2. Performance of Kuiper's test 

Using extensive sets of simulations, we compare the per- 
formances of Kuiper's test with those of the more common 
Rayleigh test. We create simulated "observations" of periodic 
sources for different count rates, different signal-to-noise ratios 
(S/N), and different signal shapes. The phase-folded light curve 
(hereafter simply "light curve") is defined as the superimposi- 
tion of a constant function (the "continuum") and of the first 
half-period of a sine function (the "pulse"), covering a fraction 
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w of the period. The S/N is defined as the ratio between the 
surfaces of the pulse and of the continuum. We draw events at 
random from the "pulse+continuum" light curves until a given 
number of events has been collected in the pulse. For each set 
of parameters, 10000 light curves are simulated. We then com- 
pare the average null-hypothesis probabilities of the two tests. 

Fig-Elshows the results for three signal intensity cases: 20, 
100, and 500 events in the pulse. In the three cases, Rayleigh's 
test is more efficient for w > 3/4, while Kuiper's performs bet- 
ter for w < 1/2. In the situation most favorable to Rayleigh's 
test (i.e. w=l, 100 events in the pulse), the significance thresh- 
old (set arbitrarily to 10~ 4 ) is crossed with a S/N 2 times smaller 
with Rayleigh's test; this advantage decreases to 15% with 
w = 3/4, and Rayleigh's test is about 30% less sensitive than 
Kuiper's with w — 1 /4. Kuiper's test has more difficulty with 
periodic signals presenting only weak modulations, but the de- 
crease in performance is moderate. It is actually well known 
that R ayleigh's test is par ticularly sensitive in the case of broad 
peaks JLeahv et alJl983bl) . On the other hand, some pulsars, in 
particular in the gamm a-rays, have pe aks much narrower than 
those simulated here (Kanbach 1998), in which case Kuiper's 
test can significantly outperform Rayleigh's. 

3. Searching for periodicities with Kuiper's test 

3.1. Frequency step 

To search for periodicities, we can calculate the Kuiper statis- 
tic over a set of test frequencies. The Kuiper periodogram (or, 
more appropriately, "frequencygram") is defined as: 

S(f) = log 10 Prob(V > V u ({ifr i (f)})l /i < / < h (8) 

where V u ({iffi(f)}) is the Kuiper statistic calculated for a fre- 
quency /. The logarithm is applied to highlight the candidate 
periods. Given a periodic signal with a frequency /o, Kuiper's 
test may present harmonic and subharmonic peaks at frequen- 
cies i ■ fo and fo/{ (plus their harmonics), I being any small 
integer. 

To avoid missing significant peaks, S(f) must be calcu- 
lated for frequencies sufficiently close to each other. Assuming 
a source emitting a photon every fo = l//o seconds, the 
phases of the first and last photons evaluated at a frequency 
fi - /o + A/ close to /o differ by A<p ^ T ■ Af, where T is the 
total duration of the observation. The coherence is preserved if: 

Atp <s 1 => Af <z 1 IT (9) 

Therefore S (/) must be calculated at equidistant frequencies, 
depending only on T . We define the oversampling parameter k: 

A/=^ (10) 

Eq.Q becomes therefore: k » 1. If this inequality is not 
satisfied, significant peaks can be missed, or underestimated 
by sampling them too far from their central frequencies. On 
the other hand, the CPU time is proportional to k. Reasonable 
values of k are in the range 20-50 (but see Sect. 13. 2> . 




Signal/Noise Ratio 



Fig. 2. Sensitivities of Kuiper's and Rayleigh's tests as a func- 
tion of the S/N between the pulse and the continuum.(a) 20 
counts in the pulse; (b) 100 counts; (c) 500 counts. In each 
graph, the curves are (from top to bottom) w=l, w=3/4, w—\ /2, 
w=l/4. The w=3/4 and w=l/4 curves are highlighted in light 
grey for visual identification. The solid line is Kuiper's test; the 
dashed line Rayleigh's. 

3.2. Number of trials 

Prob(V > V u ({il/i(f)})) is the probability that P = l/f is not 
a period of the source for a single draw of a Kuiper statistic. 
If S(f) is calculated for a set of frequencies fj, j=l, n, and 
assuming all the frequencies are independent, we have: 

Prob(3 ; | Vj > z, j = 1, ri) = 1 - Prob(V < z)'\ (11) 

The above equation can be approximated by: 

Prob(3 |V} > z, j = 1, n)^n- Prob( V > z), (12) 

under the condition n ■ Prob(V > z) <K 1. We can therefore 
correct our S (/) estimator for the number of trials: 

5(/)=S(/)+log 10 n (13) 

As n is proportional to k, Eq. Jl 31 may destroy the significance 
of some peaks if the large k's required to find the peaks are 
used. However, S(f) is strongly correlated on scales ~ Af 
and below, and we have Prob(V > z) < Prob(3 j \Vj > 
Z,j=l, ...,n) < n ■ Prob(V > z), the exact value being very 
difficult to calculate. This problem affects all period search 
algorithms, and has been addressed usin g extensive simula- 
tions for very spec i fic cas es (e.g.. iHorne & Baliunasl Il986l 
Ide Jager et alJ Il988l Il989l) . We propose here a simple and 
workable semi-analytical method to completely correct for the 
choice of k. 

We choose an arbitrary threshold V», small enough so that 
n • Prob(V > V*) <k 1. We then simulate m sets of ran- 
dom photons, and calculate max J= i „ V u (iffi(fj)) over all fj 
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for all m simulations. The probability that, for a given sim- 
ulation, max J= i v >n V u (tffi(fj)) > V* can now be estimated as 
#(max; =lj ... jn V^(^i(fj)) > V t )/m. This is the left-hand side of 
Eq. H2\ . with z-V t . We can therefore estimate the effective 
number of frequencies, n e ff : 



"eff = 



#(max j= i „ V u (Mfj)) > V*) 

m ■ Prob(V > V») 



(14) 



n e ff can be understood as the number of independent frequen- 11 
cies among the f/s. Approximating #(maxy =1 „ V u (i[fi(fj)) > 
V*) with a Poisson distribution, the uncertainty on « e ff is: 



An, 



elf 



V#(max ;=1 ,...,„ WWfj)) > V*) 



(15) 



m ■ Prob(V > V*) 
The corrected periodogram is then deduced from Eq. dl 31 : 

S(/) = S(/) + log 10 (H e ff) (16) 

Provided S (/) <s 0, 10 s ^ is the probability that the source has 
no 1// period, if n tests are performed. This method is quite 
general, and can be easily adapted to other statistical tests. 

In principle, the correction factor R-n^/n can depend on 
k, the number of photons, the frequency range, the observation 
duration, and so on, which means that R should be estimated 
separately for all observations. As this is computationally ex- 
pensive for large numbers of observations (see Sect.|5}, we ap- 
proximate R as a function of k only. Details are presented in 
Appendix|B] In the limit k— >0, Kuiper's tests are independent 
from each other, while in the limit k— too, « eff reaches a plateau. 
We can therefore write the approximation: 



R(k) 



1 



1 + ro ■ k 



(17) 



Fig-EJshows the correction factor R(k) for five sets of simu- 
lated observations with different number of photons and differ- 
ent GTIs (see Sect. I3.3i . In all cases, 10000 simulations have 
been made for each k, and we set V m so that n ■ Prob(V > V») = 
0.1, which seems sufficiently small. The behavior of R(k) fol- 
lows quite well Eq. Mil , but the curves do significantly differ 
from each other, albeit moderately. We adopt in the following 
a unique value ro=0.0815, which gives n e ff/« = 0.38 for £=20, 
or n e ff In = 0.197 for £=50. This value corresponds to the upper 
envelope of the curves of Fig.|3] 

3.3. Discontinuous observations 

In high-energy observations, the photons are collected during 
limited periods of time called "good time intervals" (GTIs). 
Their main effect is to make the cumulative distribution of the 
phases of the photons coming from a constant source depart 
from U(x) = x, because the phase intervals are not uniformly 
covered. This creates strong aliases in FFTs and Rayleigh's 
test; EF can take into account the actual exposure time of each 
phase bin, but with some limitations due to the binning. 

Kuiper's test is similar to EF in spirit, and even allows a 
perfect correction for expected non-uniformity. Like KS's test, 
Kuiper's test is independent of the shape of the putative parent 
distribution. Thus we calculate exactly, for each frequency, the 
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Fig. 3. Ratio R(k)-n e «ln as a function of k for five different 
"observations": two continuous (20 and 200 photons) ones, and 
three corresponding to obs. RP3QQQ93NQQ, RP388262N0®, and 
RP7QQ232NQQ with 100, 1596, and 377 photons. The grey area 
shows the uncertainties for one of the five curves. The dashed 
line has a slope -1. 



expected distribution £(x) of the phases for a constant source. 
This can be done by folding the GTIs according to the pe- 
riod boundaries. £(x) being piecewise constant, its cumulative 
E(x) = C %(y) dy can be calculated exactly. (D(x) in Eq. Q is 
then replaced by E(x) to calculate the Kuiper statistic. 

FigureScompares S(f) to Z(/) = log 10 Prob(fl > R (f)), 
Ro(f) being the Rayleigh statistic, in three different cases. The 
first case is a simulated 1000 s 1000-photon observation of a 
constant source. No significant peak is observed in Kuiper's test 
down to the absolute minimum frequency, 0.001 Hz=l/ 1000 s, 
while Rayleigh's produces several very significant spurious 
peaks. The second case is a real anonymous 433-photon source 
in ROSAT obs. RF5OO043A01, an observation consisting of 
8 GTIs. Again, absolutely no significant peak is observed in 
Kuiper's test down to the absolute minimum frequency, while 
Rayleigh's produces many, very deep spurious peaks. The third 
case is a simulated 300-photon source with a period P- 10 4 s 
with the GTIs of ROSAT obs. RP600 12 1N00. The photons have 
been drawn from a "w=l /4" light curve with 150 photons in the 
pulse. This observation totaled 44 733 s spread over 1 month in 
40 separate GTIs. The longest GTI lasted 31 18 s, 38 of the 40 
GTIs lasting half an hour or less. The peak at /=10~ 4 Hz has 
comparable depth in both tests. However, because many con- 
taminating peaks have an amplitude comparable to that of the 
true period, some even overwhelming it, it is impossible to re- 
trieve the 10 4 s period using Rayleigh's test. In the Kuiper peri- 
odogram, the peak at /=10~ 4 Hz dominates all other peaks with 
a probability ratio larger than 20 000. Furthermore, the second 
and third peaks are located respectively at //4 and 2 /, and are 
very probably aliases of the true frequency. 
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Fig. 4. Effect of the GTIs on Rayleigh's (top) and Kuiper's 
(bottom) tests, (a-b) Simulated 1000 s 1000-photon observa- 
tion, (c-d) 433-photon source in ROSAT obs. RF5(3<3<343A<51 
(8 GTIs). (e-f) Simulated 300-photon periodic source with the 
GTIs of obs. RP600121N00. The dashed line indicates the lo- 
cation of the true period P=10 4 s. 



Fig. 5. Kuiper periodograms of EX Hya (a) and UW Pic (b). 
The grey lines show the 3953 s and 8047 s periods respectively. 
The dotted lines indicate the 10~ 4 significance threshold. In 
(a) the short-dash and long-dash lines show respectively the 
3 • 3953 s and the possible 98-min period. In both panels the 
insets show the results of Rayleigh's test. 



4. Application to known periodic sources 

In a search for new periodic-source candidates in the ROSAT 
PSPC archive (see Sect. [5}, we found two known periodic 
sources, which particularly illustrate the power of Kuiper's test. 

4.1. EX Hya 

EX Hya is a cataclysmic variable of type D Q Her in which a 
4020 s (67 min) period has been claimed by Kruszewski et al. 

using an Einstein observation. This period was later 
confirmed bv lCordova et alJJl985l) using a very long EXOSAT 
observation. Another period of 5880 s (98 min) is claimed 
to be present in both optical (Mumford 1 19671) and X-ray 
JCordova et alJ 1 19851) light curves. Fig. |5^ shows the pe- 
riodograms for EX Hya in ROSAT obs. RP3QOQ93N00, a 
28 340 s observation (i.e. only about seven 4020 s periods), with 
a 15 542 s effective exposure time split in 12 GTIs. Rayleigh's 
test produces a forest of spurious peaks. On the other hand, 
a very significant peak (Prob<10~ 20 ) is easily recovered with 
Kuiper's test at P = 3953s, very close to the "official" period. 
The 98-min period is not found here, but there is a second peak 
at about three times the 67 min period, extremely close (within 
1%) to 2 • 98 min. This peak could be an alias of both periods. 
The existence of the optical 98 min period in the X-ray domain 
is therefore unclear, and deserves further study. 



4.2. UW Pic 

UW Pic (RX J053 1.5-4624) is a catacly smic variable of typ e 
AM Her with an optical period of 8010 s jReinsch et all 19 94). 
A phase folding of the ROSAT All-Sky Survey light curve at 
the known period suggests the existence of the period in the 
X-rays. Fig.|5J) shows the Kuiper periodogram for UW Pic in 
ROSAT obs. RP3(8<8334N<8<5, which exhibits a very significant 
peak (Prob< 10~ 6 ) at P — 8047 s, even though the observation 
consists of 29 GTIs over 2.3 days, totalling 34 501s. Again, 
Rayleigh's test is completely unable to recover the period. 

5. Period search in the ROSAT PSPC archive 

We apply Kuiper's test to the entire set of 4638 ROSAT PSPC 
observations, treating them completely separately. For simplic- 
ity, we did not attempt to combine distinct observations of a 
single object. We search for periods in a range from 100 s up to 
a third of the total duration of the observation, using k = 20. 

5.1. Source extraction 

Source detection has been performed follo wing the stan- 
dard EXS AS spatial analysis procedure ( Zimmer mann et all 
1998) on a per-observation basis using standard parameters. 
Overlapping sources were extracted twice: once ignoring the 
second source, and once excluding it. We ended up with a total 
of 186572 sources, distinct or not. To obtain optimum sensi- 
tivity, we extracted the photons up to a larger radius in high 
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signal-to-noise ratios (S/N) sources than in low S/N sources 
(1.5 times the source full width at half maximum compared 
to 0.65 times). We extract at most 2000 photons per source to 
limit computation time. We did not apply barycentric correc- 
tion here, because the effect is negligible for the low frequen- 
cies and relatively short observations considered here. 

5.2. Contaminations in the PSPC observations 

The ideal situation of a perfectly constant source is often not re- 
alized in the X-ray domain. Two types of contamination affect 
S(f): Extrinsic periodicities, and aperiodic variability. 

Spurious peaks can be produced in the Kuiper periodogram 
by extrinsic phenomena. Four different kinds of contaminations 
affect ROSAT PSPC data. One is the wobble of the ROSAT 
satellite: Its attitude oscillates around the target, masking and 
unmasking some of the sources behind the PSPC window sup- 
port structure with a period ~ 400 s= / wo b = 0.0025 Hz. The 
spacecraft's orbit also produces contamination. While the gaps 
due to observing constraints are completely taken care of with 
our method (see Sect. 13. 3> . part of the background depends 
on the position along the orbit (e.g., the scattered Solar X- 
ray background dSnowden et al.lll994l) '). and induces a peri- 
odic variability at the period of the spacecraft's revolution, i.e. 
5760 s=f oA) =l .7361 10~ 4 Hz. We also found in about 50 cases a 
period of 86 400 s, obviously of extrinsic origin. In a handful of 
observations, many objects presented very significant peaks at 
0.003 Hz. The fact that distinct objects present the same period 
clearly indicates a non-astrophysical origin, which we could 
not identify. The contaminations combine with each other, and 
peaks at / wo b + / /orb, i being any small integer, are frequent. 

Knowing the contaminating frequencies, we could check 
whether harmonic and subharmonic (see Sect. |3 peaks can 
dominate the peak at the fundamental frequency. No subhar- 
monic peak has been found to dominate the fundamental, 
but harmonic peaks occasionally do. Thus there is a risk of 
misidentifying a harmonic peak for the fundamental. 

Aperiodic variability is also a serious difficulty when deal- 
ing with long periods. When trial periods are comparable to the 
source's shortest variability time scale, or longer, the effect of 
aperiodic variability cannot cancel itself out over the successive 
phases, and strongly affects S (f), preventing period detection 
over large ranges of frequencies. This is analogous to the red- 
noise contamination in Fourier power spectra. 

5.3. Candidate selection 

Several thousand sources exhibit significant frequencies at the 
10~ 4 level (corrected for the number of trials), the vast majority 
of them being due to contamination. We applied several filters 
to reduce the number of candidates We rejected first all fre- 
quencies in broad ranges around the contaminating frequencies 
discussed above, and their harmonics. The ranges have been 
determined using a histogram of all significant frequencies. 
Aperiodic variability has been dealt with in two steps: First, we 
discarded objects for which S(l/To) < -10, Tq being the to- 
tal observation duration. We also rejected all objects for which 



more than 10 significant frequencies were found. Finally, we 
eliminated many of them after visual inspections, ending up 
with 30 objects, because several close peaks in S(f) had sim- 
ilar, but just below threshold, depths. This last step is however 
somewhat subjective. 

5.4. Candidate periodic sources 

Table[2lists the properties of the 30 remaining sources. Fig. [6] 
show S(f) for the 28 new candidates. A search over 180000 
objects produces about 18 spurious sources at the 10~ 4 level, 
assuming that all contaminations are perfectly identified. The 
periodicities must therefore be confirmed using distinct data 
sets. Six candidates have been observed several times using 
ROSAT PSPC with adequate observation durations, and are 
discussed below. The 22 other sources require additional ob- 
servations before their status can be settled, and remain candi- 
dates. 

V603 Aql (Source #17) is a classical nova for which 
a period of 63 min was found using Einstein IPC data 
dUdalski & Schwarzenberg -Czernvl [l989). Using the same 
data, Eracleou s et al J i 19 91) possibly find only its first harmon- 
ics, remaining cautious about its reality. We do not find the 
candidate period i n any of the tw o long ROSAT PSPC obser- 
vations. Similarly, Bor czvk et alJ II2003I) . combining 27 short 
observations, did not find any evidence of X-ray periodicity. 
We found however a very significa nt peak at f ~0.00199H zs 
503.2 s, a region not explored bv iBorczvk et al] il2003l) . in 
obs. RP300262N00. This observation lasted 1736s, i.e. a lit- 
tle more than 3 cycles. Such a small number of cycles could re- 
sult from a chance occurrence of three similar successive flares. 
However, a peak near this frequency is found in at least two 
other observations, but with a lower significance. The repeated 
occurence of the peak makes nevertheless the 503 s period in- 
triguing. Its absence in most observations could mean that it is 
only a characteristic variability time scale, whitout long-term 
coherence, or that the periodic modulation is not persistent. 

MRK 841 (Source #20) is a Seyfert 1 galaxy, with a can- 
didate period of 240.68 s. A similar peak is found in two 
out of nine other observations, which were rejected because 
of red noise. If Source #11 (P = 1741.89 s) is really 1 RXS 
J172136.9+431045, it is also an active galactic nucleus (AGN). 
AGN do not present periodic variability in general, but, because 
of their similarity to X-ray binaries, (quasi-)periodicities are 
not excluded. There hav e been several claims of existence of 
periodicity in AGN (e.g. Jlwasawa et all dl998h in t he Sey fert 1 
Galaxy IRAS 18325-5926. but see lBenlloch et alJfcOOlh . 

Source #22 is the symbiotic star AG Dra, and shows a pe- 
riodicity at P = 234s. The peak is quite narrow, and there is 
no evidence of contamination in the region surrounding the 
frequ ency. AG Dra is a known X-ray source ll Anderson et alJ 
1981) with tw o probable periods of about 350 and 550 days 
in the optical (Friediun g et alJl2003l) . No periodicity has ever 
been reported in the X-rays. The period was completely absent 
in the few other ROSAT PSPC observations. If real, the peri- 
odic component must be non-persistent. 
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Fig. 6. Kuiper periodograms of the 28 candidates listed in Tabled The candidate frequency is highlighted with a grey line. 
The horizontal dotted line is the 10~ 4 significance limit corrected for the number of trials. Contamination related to the wobble 
frequency is indicated with a black circle. Contamination related to the revolution frequency is indicated with a black triangle. 
The numbers are the "ID" column in Tabled 
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Source #23, with a candidate period of 161.47 s, has been 
observed three times in total. A peak at the same frequency, al- 
beit below our significance threshold, is found in the two other 
observations, making it a very good candidate. This object is, 
or is close to, the white dwarf WD 1620-391, which appears 
slightly extended in the ROSAT image. No periodicity has ever 
been reported for this object. 

Source #25 (P= 142.47 s) has been observed several times, 
without any confirmation of the candidate period. 

Source #28 presents low-significance peaks around the can- 
didate 1 16.9 s period in other observations, but the weakness of 
the source makes impossible to settle the case. 



6. Conclusions 

Kuiper's test shows very interesting properties for the search 
of long-period periodic objects. Its ability to cope very natu- 
rally, without any hidden assumption, with complex GTIs is 
unique. Compared to Rayleigh's, Kuiper's test performs bet- 
ter for narrow-peaked light curves. Kuiper's test is quite sen- 
sitive to both subharmonics and harmonics of the fundamen- 
tal frequency, but usually identifies the fundamental correctly. 
Kuiper's test is particularly adapted to X-ray missions, like 
XMM-Newton and Chandra, high-energy gamma-ray satellites 
like GLAST, and for Cherenkov telescopes. 

The semi-analytical method we propose here to correct the 
false-positive probability in case of a search over a range of 
frequencies should be quite useful in practice, not only for 
Kuiper's test, but also for other tests, as its principle can be 
easily adapted. It has the advantage of simplicity, and of being 
based on sound probability principles. 

On the 28 candidate periodic sources, 6 could be cross- 
checked using other ROSAT PSPC observations. Good or par- 
tial confirmation of the existence of periodicities is found in 3 
of these objects, and there is total absence of confirmation in 3 
objects. This does not necessarily imply a "confirmation of ab- 
sence". It must be reminded that X-ray sources are quite often 
strongly variable, and that a periodic signal may remain unde- 
tected in some observations, even tho ugh the observin g con- 
ditions seem adequate. For instance. Ilsrael et all J200 0) report 
the detection of a periodic signal in the X-ray pulsar 2E 0053.2- 
7242 in only one out of nine ROSAT PSPC observations, the 
source having dimmed by a factor > 6 between the different 
observations. 

The possibility that extrinsic contamination, or statistical 
flukes explain some, or even most, of the candidate periods 
must be considered seriously. Firm identification of the can- 
didates as periodic sources will be contingent upon the detec- 
tion of the periods in independent data sets. The building up 
of important X-ray archives from XMM-Newton and Chandra 
makes it quite probable that new observations will be available 
for a fair number of these sources in the near future. 

A C library implementing the algorithms discussed in this 
paper is available from the author. 
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Fig.A.l. Fractions of false-positives compared to the expec- 
tation for 7 different probability thresholds, from 10 _I to 10~ 7 , 
and different sample sizes. The empty circles have been calcu- 
lated using Eq. l|3}. The black symbols use different RNGs (see 
text). 
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Appendix A: Distribution of the Kuiper statistic 

We test the formulae presented in Sect. 12. ll using Monte Carlo simula- 
tions of the null hypothesis. Fig. lA.ll shows the fraction of test results 
with a probability smaller than 1CT 1 ,..., 1CT 7 respectively as a func- 
tion of the sample size. These fractions would reach asymptotically 
10" 1 ,..., 1CT 7 respectively if we had exact equations. 10 9 simulations 
have been performed for each sample size. The only discrepancies are 
found for sample sizes in the range 30-100. The asymptotic formula 
overestimates the probability of the null hypothesis by about 40% for 
a 40-member sample at the 10~ 7 level. The overestimation becomes 
unimportant for sample sizes larger than 100. An overestimation of 
the FPP is however not serious, since we are chiefly concerned with 
avoiding false positives. There is no evidence of underestimation of 
the FPP, which would be a more serious issue, as it would lead to false 
negatives. 

Empty circles in Fig. lA. ll have been calculated using Eq. J5} only. 
The overestimation reaches a factor 3 for N=20. More importantly, the 
FPPs are underestimated for 7V<15. The factor reaches 30 for ^=10 at 
the 10~ 7 level, and is close to 300 at the 10~ 8 level. 

In principle, one should be cautious about simulations exploring 
tails of probability distributions, since the random number generators 
(RNGs) may present defects in these regimes. This does not seem to 
be a problem here. Indeed, the simulated FPPs match perfectly the 
expected ones when either an exact formula is used, or when N is 
large enough if Eq. is used. Moreover, the simulations that end up 
in the very tail of the Kuiper-statistic distribution are not at all in the 
tails of the uniform distributions used to generate the list of photons. 

We further checked the validity of the simulations by com- 
paring different RNGs. The curves used the MT 19937 generator 
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Table 1. Properties of the confirmed (first two rows) and candidate periodic sources. ROSAT coordinates are J2000. The fre- 
quency is expressed in mHz. N is the number of photons. S (/) is the decimal logarithm of the FPP corrected for the number of 
trials. Np is the number of frequencies searched. "ID" refers to the numbers in Fig. [5] Identifications in italics are tentative. 
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iMatsumo to & Nishirnural[l998l) . The black triangles at N = 40 used 
the RANLUX generator at luxury level 2 lLuschedll994l) . and the 
black squares at N = 50 used the (very poor) standard UNIX RNG 
(C function rand()) 2 . The different RNGs produce perfectly compat- 
ible results within the statistical fluctuations due to the limited number 
of simulations, which reach 10 % at the 10~ 7 level, and 1 % at the 10~ 5 
level. 

Appendix B: Details on the R correction factor 

We explore in more detail the properties of the correction factor R 
with respect to other parameters. Fig. lB.ll shows the effect of the fre- 
quency range. We cut the set of trial frequencies into chunks of 1000 
frequencies for two 100-photon simulated observations covering the 
GTIs of obs. RP281045N00 and RP808035A01. We used k = 20, 
and performed 10000 simulations in each case. While not constant, 
R changes moderately, without any visible trend. 

We test the dependence of R on the number of photons N, the 
number of trial frequencies N F (which is roughly proportional to the 
observation duration), and a measure of the importance of gaps in the 
observation, given by the ratio between the "on-time" (i.e. the sum of 
the individual GTI durations) and the total duration. Fig. IB. 21 shows 

2 These algorithms can be found in the GNU Scientific Library at 
http : //sources . redhat . com/gsl/ 
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FigJ3.1. Correction factor R for successive chunks of 1000 
frequencies for two simulated 100-photon observations using 
the GTIs of obs. RP281O45N08 (a) and RP8OOO35A01 (b). 
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Fig. B.2. Correction factor R for the 30 sources from Tabled 
as a function of TV (a), Nf (b), and the on-time vs total duration 
ratio (c). The dashed line is the best linear fit. 

R for the 30 sources from Table Q as a function of N, Np, and on- 
time vs total duration ratio for k = 20, with 1000 simulations per 
observation. A significant correlation is found only between R and N, 
making R increase with N, with a 1% probability chance occurrence 
of Spearman's correlation coefficient. 

The average R value for k = 20 is 0.335, with a rms corrected 
for the contribution of the number of simulations of 0.066. After re- 
moval of the best linear fit to Fig. IB. 2b , the rms becomes 0.053. Hence 
most of the scatter remains unexplained, and probably results from the 
distribution of the GTIs. 

All parameters except k can be neglected as a first approxima- 
tion, and using R(k) is justified. A more detailed approximation would 
make use of both k and TV. Unless one is searching for periodic sources 
in a large number of observations, which is the case in this work, the 
correct approach is nevertheless to calculate R specifically for the ob- 
servation at hand. 
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